Search CORE

10 research outputs found

Substring filtering for low-cost linked data interfaces

Author: E Minack
I Ermilov
J Van Herwegen
L Rietveld
M Arias Gallego
M Nelson
MP Ferguson
NR Brisaboa
O Erling
R Li
R Verborgh
S van Hooland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Recently, Triple Pattern Fragments (TPFS) were introduced as a low-cost server-side interface when high numbers of clients need to evaluate SPARQL queries. Scalability is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the TPFS interface purposely does not support complex constructs such as SPARQL filters, queries that use them need to be executed mostly on the client, resulting in long execution times. We therefore investigated the impact of adding a literal substring matching feature to the TPFS interface, with the goal of improving query performance while maintaining low server cost. In this paper, we discuss the client/server setup and compare the performance of SPARQL queries on multiple implementations, including Elastic Search and case-insensitive FM-index. Our evaluations indicate that these improvements allow for faster query execution without significantly increasing the load on the server. Offering the substring feature on TPF servers allows users to obtain faster responses for filter-based SPARQL queries. Furthermore, substring matching can be used to support other filters such as complete regular expressions or range queries

Crossref

Ghent University Academic Bibliography

DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data

Author: A.-C. Ngonga Ngomo
B. Bishop
C. Bizer
E. Minack
F. Belleau
J. Broekstra
J. Lehmann
S. Auer
Z. Pan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract. Triple stores are the backbone of increasingly many Data Web appli-cations. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in gen-eral. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triple store implementations. In this paper, we propose a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triple stores and, thus, settled on measuring performance against a relational database which had been con-verted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applica-tions against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more use-ful to compare existing triple stores and provide results for the popular triple store implementations Virtuoso, Sesame, Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the per-formance of triple stores is by far less homogeneous than suggested by previous benchmarks. 1

CiteSeerX

Crossref

On the Complexity of Query Result Diversification

Author: Abiteboul S.
Adomavicius G.
Agrawal R.
Amer-Yahia S.
Berbeglia G.
Borodin A.
Capannini G.
Chen Z.
Demidova E.
Deng T.
Drosou M.
Durand A.
Fagin R.
Fraternali P.
Gollapudi S.
Hemaspaandra L. A.
Ilyas I. F.
Jin W.
Koutrika G.
Ladner R. E.
Lappas T.
Li C.
Liu Z.
Minack E.
Prokopyev O. A.
Schnaitter K.
Stefanidis K.
Valiant L.
Vardi M. Y.
Vee E.
Vieira M. R.
Xie M.
Yu C.
Yu C.
Zhang M.
Ziegler C.-N.
Publication venue
Publication date: 01/01/2013
Field of study

Query result diversification is a bi-criteria optimization problem for ranking query results. Given a database D, a query Q and a positive integer k, it is to find a set of k tuples from Q(D) such that the tuples are as relevant as possible to the query, and at the same time, as diverse as possible to each other. Subsets of Q(D) are ranked by an objective function defined in terms of relevance and diversity. Query result diversification has found a variety of applications in databases, information retrieval and operations research. This paper studies the complexity of result diversification for relational queries. We identify three problems in connection with query result diversification, to determine whether there exists a set of k tuples that is ranked above a bound with respect to relevance and diversity, to assess the rank of a given k-element set, and to count how many k-element sets are ranked above a given bound. We study these problems for a variety of query languages and for three objective functions. We establish the upper and lower bounds of these problems, all matching, for both combined complexity and data complexity. We also investigate several special settings of these problems, identifying tractable cases. 1

CiteSeerX

Crossref

Edinburgh Research Explorer